Where is Randomness Needed to Break the 
Square-Root Bottleneck? 



Patrick Kuppinger, Giuseppe Durisi, and Helmut Bolcskei 
ETH Zurich, 8092 Zurich, Switzerland 
E-mail: {patricku, gdurisi, boelcskei}@nari.ee.ethz.ch 



Abstract — As shown by Tropp, 2008, for the concatenation of 
two orthonormal bases (ONBs), breaking the square-root bot- 
tlenecli in compressed sensing does not require randomization 
over all the positions of the nonzero entries of the sparse coef- 
ficient vector. Rather the positions corresponding to one of the 
two ONBs can be chosen arbitrarily. The two-ONB structure 
is, however, restrictive and does not reveal the property that is 
responsible for allowing to break the bottleneck with reduced 
randomness. For general dictionaries we show that if a sub- 
dictionary with small enough coherence and large enough car- 
dinality can be isolated, the bottleneck can be broken under the 
same probabiUstic model on the sparse coefficient vector as in 
the two-ONB case. 

I. Introduction 

The central idea underlying compressed sensing (CS) is 
to recover a sparse signal from as few non-adaptive linear 
measurements as possible |[l), Q. Given the measurement 
outcome y G C*^ and the measurement matrix D e (£Mxn 
(M < N), often referred to as dictionaryj^ we want to find 
the sparsest coefficient vector x e that is consistent with 
the measurement outcome, i.e., that satisfies y = Dx. This 
problem can be formalized as follows: 

(PO) find argmin||x||p subject to y = Dx. 

Here, ||x||q denotes the number of nonzero entries of the 
vector X. Unfortunately, solving (PO) for practically relevant 
problem sizes N, M is infeasible as it requires a combinatorial 
search. Instead, the CS literature has focused on the convex 
relaxation of (PO), i.e., on the following -minimization prob- 
lem: 

(PI) find arg min||x|| subject to y = Dx 

commonly referred to as basis pursuit (BP) |[3)-||8]. Here, 
||x||-^ = X^ilil^il denotes the ^i-norm of x. Since (PI) can 
be cast as a linear program (in the real case) or a second-order 
cone program (in the complex case), it can be solved more 
efficiently than (PO). 

It is now natural to ask under which conditions the solu- 
tions of (PO) and (PI) are unique and coincide. A sufficient 

'Throughout the paper, we assume that the columns of D have unit 
£2-norm, i.e., HdiHj = 1 for j = 1, .... A''. 



condition for this to happerj^ |j4l-|j6j is ||x||g < S, where the 
sparsity threshold S = (1 + 1 /d) / 2 depends on the dictionary 
coherence d = maxi^j\df dj\. Sparsity thresholds S larger 
than (1 + l/d)/2 can be established if more information 
on the dictionary is available Q-Q, e.g., if the dictionary 
consists of the concatenation of two or more orthonormal 
bases (ONBs), or — more generally — if a sufficiently large 
sub-dictionary with coherence much smaller than d can be 
isolated f9\. We emphasize that the results in P^-fOl apply 
to all vectors x with ||x||(, < S — irrespective of the positions 
and the values of the nonzero entries of x. 

The line of work presented in Il4|-|]|9j| leads to sparsity 
thresholds S that are on the order of 1/d. From the Welch 
lower bound |T0] 

d> y/{N - M)/[M{N -1)] 

we can conclude that the thresholds in ||4)-||9) are at best on 
the order of \/ AI (for N ^ M). This scaling behavior is 
sometimes referred to as the square-root bottleneck. A better 
scaling behavior can be obtained by asking for sparsity thresh- 
olds that hold for almost all — ^rather than all (as in |4|-|9]) — 
vectors x, or, more precisely, by asking for sparsity thresholds 
that hold with high probability, given a probabilistic model 
on xj^ Following the terminology used in |11|, we refer to 
sparsity thresholds that hold for almost all ^-sparse vectors x 
as robust sparsity thresholds. 

The improvements in the scaling behavior that result from 
the relaxation to robust sparsity thresholds will, of course, de- 
pend on the probabilistic model on x pT)-pj[ . A widely used 
probabilistic model for n-sparse vectors x is to choose the 
positions of the n nonzero entries (i.e., the sparsity pattern) 
of X uniformly at random among all possible (^) support 
sets of cardinality n. The values of these nonzero entries of x 
are drawn from a continuous probability distribution, with the 
additional constraint that their phases are i.i.d. and uniformly 
distributed on [0, 27r) |1 IJ , |12j . For this probabilistic model it 

^In the remainder of the paper, whenever we speak of a vector x, we 
implicitly assume that this vector is consistent with the observation y, i.e., 
it satisfies y = Dx. 

^An alternative approach, which we do not pursue in this paper, is to 
introduce a probabilistic model on the dictionary D [T|, j2]. 



is shown in p2| that the square-root bottleneck can be broken. 
More specifically, the main result in [12J states that, assuming 
a dictionary with coherence on the order of 1/ a/M, a robust 
sparsity threshold on the order of AI /(log N) can be obtained. 
Put differently, this result shows that to recover almost all 
vectors x with S nonzero entries, the required number of 
non-adaptive linear measurements M is (order-wise) S log N 
instead of S^. 

Remarkably, for dictionaries that consist of the concate- 
nation of two ONBs, robust sparsity thresholds on the order 
of Af/(log A^) can be obtained with reduced randomness as 
compared to the case of general dictionaries. Specifically, it 
was found in [TTl, [T2] that it suffices to pick the positions of 
the nonzero entries of x corresponding to one of the two ONBs 
uniformly at random, while the positions of the remaining 
nonzero entries can be chosen arbitrarily. The probabilistic 
model on the values of the nonzero entries of x (corresponding 
to both ONBs) remains the same as for the general dictionaries 
considered in |T2l . 

Contributions: The two-ONB result in fiTI , ||T2| is in- 
teresting as it shows that one need not choose the locations of 
all the nonzero entries of the sparse vector randomly to break 
the square-root bottleneck. However, the two-ONB structure 
is restrictive and does not reveal which property of the dic- 
tionary is responsible for allowing to break the square-root 
bottleneck with reduced randomness. The two ONBs are on 
equal footing. 

The purpose of this paper is twofold. First, we extend the 
two-ONB result in |TT], 1 12| to general dictionaries. Second, 
by virtue of this extension, we show that — for a general dic- 
tionary D with low coherence d — the fundamental property 
needed to break the square-root bottleneck with reduced ran- 
domness is the presence of a sufficiently large sub-dictionary 
A with coherence much smaller than d. The positions of 
the nonzero entries of x corresponding to A can be chosen 
arbitrarily, and the positions of the remaining nonzero en- 
tries must be chosen randomly. Naturally, the larger the sub- 
dictionary A, the more significant the reduction in random- 
ness becomes. Randomization over the remaining part of the 
dictionary ensures that the sparsity patterns that cannot be 
recovered through BP occur with small enough probability. 
More formally, we prove the following result. Consider a 
general dictionary D with coherence on the order of 
that contains a sub-dictionary A with coherence on the or- 
der of (logA^)/A/ and cardinality at least on the order of 
M/(logA^). Then, a robust sparsity threshold on the order 
of M/(log N) can be established — and hence the square-root 
bottleneck is broken — under the same probabilistic model on 
the vector x as in the two-ONB case, whenever the spectral 
norms of A and of the sub-dictionary containing the remain- 
ing columns of D satisfy certain technical conditions. These 
technical conditions are trivially satisfied, e.g., for dictionaries 



that consist of two tight frames. 

Our analysis relies heavily on the mathematical tools de- 
veloped in IT2) for the two-ONB setting. 

Notation: Throughout the paper, we use lowercase bold- 
face letters for column vectors, e.g., x, and uppercase bold- 
face letters for matrices, e.g., D. For a given matrix D, we 
denote its conjugate transpose by and d,; stands for its 
ith column. The spectral norm of a matrix D is 1 1 D 1 1 — a/A, 
where A is the maximum eigenvalue of D^D. The minimum 
and maximum singular value of a matrix D are denoted by 
(Tmin(D) and (Tmax(D), respectively, rank(D) stands for the 
rank of D, and ||D||-^2 = inaxilUdiHj}. We use I„ to 
denote the n x n identity matrix and stands for the all- 
zero matrix of appropriate size. The natural logarithm is de- 
noted as log. For two functions f{M) and g{M), the notation 
/(M) = Oig{M)) means that limM^oo|/(M)| /|ff(M)| is 
bounded above by a finite constant, and f{M) — <d{g{M)) 
means that there exist two positive finite constants fci and ^2 
such that fci < limA/^oo|/(A/)| /|5(M)| < A;2. Whenever 
we say that a vector x e has a randomly chosen sparsity 
pattern of cardinality n, we mean that the support set of x is 
chosen uniformly at random among all (^) possible support 
sets of cardinality n. 



II. Brief Review of Previous Relevant Results 

Robust sparsity thresholds for dictionaries consisting of two 
ONBs were first obtained in [11] and later improved in |[T2). 
In Theorem[T]below, we restate the result in [12J in a slightly 
modified form, which is better suited to draw parallels to 
the more general case. The theorem follows by combining 
Theorems D, 13, and 14 in fTI]. 

Theorem 1: Assume thafliV > 2. Let D e C^^^^ be the 
concatenation of two ONBs A and B for C^^ (i.e., N = 2M) 
and denote the coherence of D as d. Fix s > 1. Let the vector 



x e 



have an arbitrarily chosen sparsity pattern of Ua 
nonzero entries corresponding to columns of sub-dictionary A 
and a randomly chosen sparsity pattern of rif, nonzero entries 
corresponding to columns of sub-dictionary B. Suppose that 



+ nb < min{crf"^/(slog7V),d"^/2} 



(1) 



where c is no smaller than 0.004212. If the values of all 
nonzero entries of x are drawn from a continuous probability 
distribution, x is the unique solution of (PC) with probability 
exceeding (1 — N^^). Furthermore, if na and nf,, in addition 
to ([T]i, satisfy 

n, + nfc<d-V[8(s + l)log^] (2) 

and the phases of all nonzero entries of x are i.i.d. and uni- 
formly distributed on [0, 2tt), then x is the unique solution of 
both (PO) and (PI) with probabiHty exceeding (1 - iN^"). 

'^\a |l2| M > 3 (and hence A'^ > 6) is assumed. However, it can be sliown 
that TV > 2 is sufficient to establish the result. 



Interpretation of Theorem [7} Assume that D has co- 
herence d — 0{1/^/M)- As a consequence of ([TJ and (|2|, 
Theorem [TjestabHshes (under certain technical conditions on 
the values of the nonzero entries of x) the robust sparsity 
thresholcj^S* > Ua + nt ^ e(M/(log TV)). 

This result is interesting as it shows that we do not need the 
entire sparsity pattern of x to be chosen at random but rather 
the positions of the non-zero entries corresponding to one of 
the two ONBs can be chosen arbitrarily. 

In the following section, we first present (in Theorem |2| 
an extension of the two-ONB result in pT| , p2j to general 
dictionaries. As a consequence of Theorem |2pwe then es- 
tablish that — for a general dictionary D with low coherence 
d — the fundamental property that allows to break the square- 
root bottleneck with reduced randomness is the presence of 
a sufficiently large sub-dictionary A with coherence much 
smaller than d. 

III. Main Results 

Consider a dictionary D = [A B], where the sub- 
dictionary A has Na elements (i.e., columns) and coherence 
a and the sub-dictionary B has = N — Na elements 
and coherence b. The set of all such dictionaries is denoted 
as T>{d, a, b). Correspondingly, we view the vector x as the 
concatenation of the two vectors x^ e C^" and Xf, e C^*" 
such that y — Dx = Axq + Bx^. Since A and B are sub- 
dictionaries of D, we have a,b < d. We now state our main 
result. 

Theorem 2: Assume that N > 2. Let D — [A B] be a 
dictionary in T>{d, a, b). Fix s > 1 and 7 € [0, 1]. Consider 
a random vector x — [x^ x^] where Xq has an arbitrarily 
chosen sparsity pattern of cardinality Ua such that 

6%/2 V^ad^s log N + 2{na - l)a < (1 - 7)6"^/" (3) 

and Xf, has a randomly chosen sparsity pattern of cardinality 
Ub such that 

2Vn662slog7V+^|iB||V2,&A|!||B|| < ^e-''\ 

(4) 

Furthermore, assume that the total number of nonzero entries 
of X satisfies 

Ua + Ub < d-^/2. (5) 

Then, if the values of all nonzero entries of x are drawn from 
a continuous probability distribution, x is the unique solution 
of (PO) with probability exceeding (1 — N^'^). Furthermore, 
if Ua and n^, in addition to ([3]l-(|5]l, satisfy 

«a+"b <d"V[8(s + l) log TV] (6) 

^Whenever for some function (/(Af , A'') we write &{g(M, N)) or 
0{g{M, N)), we mean that tlie ratio N/M remains fixed wliile M — > 00. 



and the phases of all nonzero entries of x are i.i.d. and uni- 
formly distributed on [0, 2tt), then, x is the unique solution of 
both (PO) and (PI) with probabiUty exceeding (1 - 3iV"*). 

Proof: The proof is based on the following lemma, which 
is the main technical result of this paper and whose proof can 
be found in Appendix [A| 

Lemma 1: Fix s > 1 and 7 £ [0, 1]. Let S be a sub- 
dictionary of D = B] G T>{d,a,b) that contains 
arbitrarily chosen columns of A and Ub columns of B cho- 
sen uniformly at random. If Ua and Ub satisfy conditions ^ 
and Q, then, the minimum singular value (^^[^(S) of the sub- 
dictionary S obeys 

p{^Tmi„(s) < 1/V2} < 7v-^ 

The proof of Theorem |2] is then obtained from Lemma [T| 
and the results in | [T2| as follows. The sparsity pattern of 
X assumed in the statement of Theorem |2] induces a sub- 
dictionary S of D containing arbitrarily chosen columns 
of A and Ub randomly chosen columns of B. As a conse- 
quence of Lemma[T] the smallest singular value of S exceeds 
l/y/2 with probability at least (1 — N~''). This property 
of the sub-dictionary S, together with condition ^ and the 
requirement that the values of all nonzero entries of x are 
drawn from a continuous probability distribution, implies, as 
a consequence of 1 12 Thm. 13], that x is the unique solution 
of (PO) with probability at least (1 - TV""). If, in addition, 
condition (|6]l is satisfied and the phases of all nonzero entries 
of X are i.i.d. and uniformly distributed on [0, 2tt), we can 
apply 1 12, Thm. 14] (with S = TV"") to infer that x is the 
unique solution of both (PO) and (PI) with probability at least 
(l-iV-'')(l-2iV-^) > (1-37V-"). ■ 
Interpretation of Theorem |2| We next present an in- 
terpretation of our result and reveal the fundamental prop- 
erty that allows to break the square-root bottleneck with re- 
duced randomness. In particular, we determine conditions 
on the dictionary such that both ria = Q{AI/{\og N)) and 
Ub ~ 0{AI/{\ogN)). As a consequence, a robust sparsity 
threshold S > Ua + Ub = 9(A//(logiV)) is estabUshed. In 
the following, for clarity of exposition, we only consider the 
dependency of Ua and Ub on the dictionary parameters d, a, 
b, Na, Nb, and the spectral norms of A and B, and absorb all 
constants that are independent of these quantities in 0(7, s), 
where 7 and s are defined in Theorem|2] Note that 0(7, s) can 
change its value at each appearance. Condition (|3]l together 
with Ua < Na yields the following constraint on : 

ria < c{-f, s) mm{d-y {log N),a-\Na} . 

This constraint is compatible with Ua = Q{M/{\ogN)), if 
the following three requirements are fulfilled: 

i) the coherence of D satisfies d = 0(1/a/M) 



ii) the coherence of A satisfies a = 0{{\ogN) /M) 

iii) the cardinaHty of A satisfies Na > c M/ (log N) 

where c is a constant that can change at each appearance. 
Condition Q, which can be rewritten as 



nb < c(7, s) mini 



logiV' IIBII 



IBII 



(7) 



is more laborious to interpret. For the constraint (|7]i to be 
compatible with rtb = 8(A//(log-/V)), we need requirement 
i) above to be fulfilled (recall that b < d), together with the 
following two requirements on the spectral norms of B and 
A, namely 

iv) ||B||^ < ciVf,(logA^)/M 
V) ||A||' <ciVb(log7V)/(||BfM). 

We finally note that when the requirements i) - v) are met, con- 
ditions ^ and (|6]l, which can then be rewritten as Ua+ni, < 
c M and Ua+rib < cMj (log iV), respectively, are compatible 
with both = e(M/(log7V)) and = e(Af/(logiV)). 

Hence, a robust sparsity threshold S > Ua + rib — 
Q{M/ (log N)) can be established under the same probabilis- 
tic model on x as in the two-ONB case; namely, the positions 
of the nonzero entries of x corresponding to B have to be 
chosen randomly, while the positions of the nonzero entries 
of X corresponding to A can be chosen arbitrarily. 

The requirements iv) and v) are difficult to interpret because 
they depend on the spectral norms of the sub-dictionaries A 
and B. To get more insight into these two requirements, we 
consider the special case of A and B being tight frames for 
|fl4| (with the frame elements ^2-normalized to one). 
Then, || A||^ = iV^/A/ and ||B||^ = 7Vf,/A/, so that iv) is triv- 
ially satisfied and v) reduces to Na < cM log N. However, 
because of the Welch lower bound |Tol condition ii) puts a 
more stringent restriction on the cardinality of Na for large 
Af. Hence, a robust sparsity threshold of 0(A^/(log iV)) is 
obtained, under the same probabilistic model on the vector x 
as in the two-ONB case, if the coherence of sub-dictionary A 
satisfies a = 0{{logN)/M). 

A simple dictionary that satisfies i) - v): For M = p^, 
with p prime and k € N"*", a dictionary D with coherence 
equal to can be obtained by concatenating A/ + 1 

ONBs for C^^ Since D constitutes a tight frame for C^-'^, 
by 1 12 1 a robust sparsity threshold of 6(A//(log A^)) is ob- 
tained by randomizing over all positions of the nonzero entries 
of X. Note, however, that we can write D = [A B], where A 
is an ONB (a = 0) and B is the concatenation of the remaining 
M ONBs and hence a tight frame for C^^. As Na = M 
the requirements iii) and v) are satisfied. Therefore, by the 
results of the previous paragraph, a robust sparsity threshold 
of 8(A// (log A^)) is obtained by randomizing only over the 
positions of the nonzero entries of x corresponding to B. 



Appendix A 
Proof of Lemma[T] 

Since the minimum singular value ainin(S 
dictionary S can be lower-bounded as 'T^jjj(S) 
IIS-^S - I„ I „. 11, we have 



of the sub- 

> 1 - 



O'min(S) < 1 
< P{1 



2^-P{aL(S)<l/2} 

S^S J < 1/2} 
= P{||S«S-I„„+„J| > 1/2}. (8) 

Next, we quantify the tail behavior of the random variable 
H = ||S^S — I„^+„J|, which will then lead to an upper 
bound on the probability of cr,nin(S) falling below 1/a/2. To 
this end the following lemma will be useful. 

Lemma 2 ( /| 72j Prop. 10]): If the moments of the non- 
negative random variable R can be upper-bounded as 
[E(i?'?)]i/9 < + /3 for all g > Q e 

a,(3 e Rn, then. 



Zq , where 



~uV4 



for all u> ^Q. 



To be able to apply Lemma |2] to H = ||S^S — I„^_|.„j^||, 
we first need an upper bound on [E(iJ'^)]^/'? that is of the 
form a-^g + /?. We start by writing the sub-dictionary S as 
S — [Sa Sb], where Sa and denote the matrices containing 
the columns chosen arbitrarily from A and randomly from B, 
respectively. We then obtain 



S^S - 1 



'-'a '-'a '-n 



H 



Applying the triangle inequality for operator norms, we can 
now upper-bound H according to 

S(, Sf, Sb — Ifii, 

S^Sb — I„j, 
<max{||SfSa-I II 11^^ 
< ||S^^S„-I„ 11 + 



< 



\^b In 





^b 

11} + I 







IsfsJ 



(9) 



where the second inequality follows because the spectral norm 
of both a block-diagonal matrix and an anti-block-diagonal 
matrix is given by the largest among the spectral norms 
of the individual nonzero blocks. Next, we define Ha = 
||S^Sa — In„ ||, i?6 = ||S^Sb — I„j, ||, and Z = ||S^Sb||. 
It then follows from Q that for all g > 1 

[E(ff«)]i/« < [EiiHa + Hb + Zy)]'/" 



b > 



<Ha 



[m'b) 



1/9 



[E(Z«) 

1/9 



1/9 



(10) 



where the second inequality is a consequence of the triangle 
inequality for the norm [E(|-|'')]^/'' (recall that q > 1), and in 
the last step we used the fact that Ha is a deterministic quan- 
tity. All expectations in ( [TQ] i are with respect to the random 
choice of columns from sub-dictionary B. 

We next upper-bound the three terms on the right-hand 
side (RHS) of ( [TO| i individually. Applying Gersgorin's disc 
theorem fTS] Th. 6.1.1] to the first term, we obtain 



< {Ua - l)a. 



(11) 



For the second term on the RHS of ( [TO| l we can use |12 Eq. 
6.1] to get 



E 



(I 



1/9 



< y/lUb^nbri + 2nb\\Bf/Nb (12) 



IBII 



(13) 



where ri = max{l, log(n;,/2 + 1), (7/4}. Assuming that 
q > max{41og(nh/2 + 1),4} and hence ri = gr/4, we can 
simplify ( [T2| i to 

Nb 

To bound the third term on the RHS of ( fTO] ), we use the upper 
bound on the spectral norm of a random compression 1 12 
Thm. 8] combined with rank(S^Sb) < Ub- This yields 

1/9 



E( IIS^^S 



< 3. 



SfBl 



(14) 



where r2 = max{2, 2 log nf,, g/2}. Assuming that q > 
max{41ogn6,4}, we can further bound the RHS of ( [T4] | to 
get 



[E(Z«) 




(15) 



(16) 



where ( [T5]l follows from the fact that the magnitude of each 
entry ofS^B is upper-bounded by d and, thus, || S^B || ^ ^ 
Vd^na. To arrive at ([16) we used ||Sf B|| < ||Sf ||||B|| < 
||A||||B||, which follows from the sub-multiplicativity of the 
spectral norm and the fact that the spectral norm of the subma- 
trix Sa of A cannot exceed that of A. We can now combine 
the upper bounds ( fTT) , ( fTS] ), and ( [T6] l to obtain 

mH")]'^' < K-l)a + 6v/fe^V9+5^IIBir 




+ K-l)a+^||B|r 




AIIIIBII 



ay/q + P 



for all q > Qi ^ max{4 log(nb/2 + 1), 4 lognf,, 4}. 
Hence, Lemma |2] yields 

for all u > ^/Qi- In particular, under the assumption N > 
e « 2.7, it follows that the choice u — \J 4s log N satisfies 
^VOi for any s > 1. Straightforward calculations reveal 
that conditions ([3]) and Q ensure that e^/^{au + 13) < 1/2, 
which together with ([H) then leads to 

P{^min(S) < I/V2} < V{H > 1/2} 

< ¥{H > e^/*(au + /3)} 
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